Dynamic Processing Slots Scheduling for I/o Intensive Jobs of Hadoop on Pathology Data

نویسندگان

  • K. H Naz
  • Anisha Rodrigues
چکیده

The increasing use of computing resource in our daily lives leads to data generation at an astonishing rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data.It has encouraged the development. Hadoop consists of Hadoop Mapreduce and Hadoop Distributed File System (HDFS), is a platform for large scale data and processing. Distributed processing has become common as the number of data has been increasing rapidly worldwide and the scale of processes has become larger, so that Hadoop has attracted many cloud computing enterprises and technology enthusiasts. Hadoop users are expanding under this situation. My studies are to develop the faster of executing jobs Originated by Hadoop and Hive. Our Proposed work is to set dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce focusing on I/O wait during execution of the pathology data efficiently on the Hadoop cluster or clouds. Assigning more tasks to added free slots when CPU resources with the high rate of I/O wait have been detected on each active Task Tracker node leads to the improvement of 30% of CPU performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloudbased data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and “data locality” (assigning tasks to nodes that contain their input da...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion

Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks among data-intensive applications. However, under this scheme, Hadoop schedulers do not work effectively for both phases. We reveal that there exists a serious fairness issue among jobs of different sizes, leading to prolonged execution for small jobs, which are starving for reduce slots held by large jobs. To solve t...

متن کامل

Queuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs

Big Data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consists of a number of automatically parallelized ta...

متن کامل

Reliable and Locality Driven Scheduling in Hadoop

The increasing use of computing resources in our daily lives leads to data being generated at an unprecedent rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data, and its ability to process them. This has encouraged the development of cluster based data-intensive applications. Hadoop is a popular open source framework k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014